Assessing Disclosure Risk in Anonymized Datasets
نویسندگان
چکیده
Sharing of log data is a valuable step towards the improvement of network security. However, logs often contain sensitive information and organizations are hesitant to share them. Anonymization methods are used for increasing protection, lowering the disclosure risk to a level considered safe. Accordingly, a metric for anonymity is necessary to quantitatively assess the risk before releasing log data. In this paper, we propose a general framework for estimating disclosure risk using conditional entropy between the original and the anonymized datasets. We demonstrate our approach using network log files.
منابع مشابه
Disclosure Risk Measurement of Anonymized Datasets after Probabilistic Attacks
We present a unified metric for analyzing the risk of disclosing anonymized datasets. Datasets containing privacy sensitive information are often required to be shared with unauthorized users for utilization of valuable statistical properties of the data. Anonymizing the actual data provides a great opportunity to share the data while preserving its statistical properties and privacy. The risk ...
متن کاملAn Effective Method for Utility Preserving Social Network Graph Anonymization Based on Mathematical Modeling
In recent years, privacy concerns about social network graph data publishing has increased due to the widespread use of such data for research purposes. This paper addresses the problem of identity disclosure risk of a node assuming that the adversary identifies one of its immediate neighbors in the published data. The related anonymity level of a graph is formulated and a mathematical model is...
متن کاملDisclosure Risk and Sample of Anonymized Records
The disclosure problem relates to the possibility of identifying individuals in the released statistical information . The paper evaluates the disclosure risk on a 3% sample of individual data from the Slovene 1991 Population Census . The concept of uniqueness is used for this purpose . The level of regional aggregation, the number of identifying variables and the grouping of the categories are...
متن کاملEvaluation of the disclosure risk of masking methods dealing with textual attributes
Record linkage methods evaluate the disclosure risk of revealing confidential information in anonymized datasets that are publicly distributed. Concretely, they measure the capacity of an intruder to link records in the original dataset with those in the masked one. In the past, masking and record linkage methods have been developed focused on numerical or ordinal data. Recently, motivated by t...
متن کاملCommunity Detection in Anonymized Social Networks
Social media and social networks are embedded in our society to a point that could not have been imagined only ten years ago. Facebook, LinkedIn, and Twitter are already well known social networks that have a large audience in all age groups. Recently more trendy social sites such as Pinterest, Instagram, Vine, Tumblr, WhatsApp, and Snapchat are being preferred by the younger audience. The amou...
متن کامل